Distributional Similarity Models: Clustering vs. Nearest Neighbors

نویسنده

  • Lillian Lee
چکیده

Distributional similarity is a useful notion in estimating the probabilities of rare joint events. It has been employed both to cluster events according to their distributions, and to directly compute averages of estimates for distributional neighbors of a target event. Here, we examine the tradeoffs between model size and prediction accuracy for cluster-based and nearest neighbors distributional models of unseen events.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Adaptive Spectral Clustering Algorithm Based on the Importance of Shared Nearest Neighbors

The construction of a similarity matrix is one significant step for the spectral clustering algorithm; while the Gaussian kernel function is one of the most common measures for constructing the similarity matrix. However, with a fixed scaling parameter, the similarity between two data points is not adaptive and appropriate for multi-scale datasets. In this paper, through quantitating the value ...

متن کامل

Combining Syntactic Co-occurrences and Nearest Neighbours in Distributional Methods to Remedy Data Sparseness.

The task of automatically acquiring semantically related words have led people to study distributional similarity. The distributional hypothesis states that words that are similar share similar contexts. In this paper we present a technique that aims at improving the performance of a syntax-based distributional method by augmenting the original input of the system (syntactic co-occurrences) wit...

متن کامل

Improvement of Jarvis-Patrick Clustering Based on Fuzzy Similarity

Different clustering algorithms are based on different similarity or distance measures (e.g. Euclidian distance, Minkowsky distance, Jackard coefficient, etc.). Jarvis-Patrick clustering method utilizes the number of the common neighbors of the k-nearest neighbors of objects to disclose the clusters. The main drawback of this algorithm is that its parameters determine a too crisp cutting criter...

متن کامل

FLAG: Fast Large-Scale Graph Construction for NLP

Many natural language processing (NLP) problems involve constructing large nearest-neighbor graphs between word pairs by computing distributional similarity between word pairs from large corpora. In this paper, first we describe a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data in memory and time efficient manner, FLAG maintains...

متن کامل

Spatio-Temporal Outlier Detection Technique

Outlier detection is very important functionality of data mining, it has enormous applications. This paper proposes a clustering based approach for outlier detection using spatio-temporal data. It uses three step approach to detect spatiotemporal outliers. In the first step of outlier detection, clustering is performed on the spatio-temporal dataset with proposed Spatio-Temporal Shared Nearest ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999